Castelo Branco
A Large-Scale Comparative Analysis of Imputation Methods for Single-Cell RNA Sequencing Data
Iwashita, Yuichiro, Abbasi, Ahtisham Fazeel, Kise, Koichi, Dengel, Andreas, Asim, Muhammad Nabeel
Background: Single-cell RNA sequencing (scRNA-seq) enables gene expression profiling at cellular resolution but is inherently affected by sparsity caused by dropout events, where expressed genes are recorded as zeros due to technical limitations. These artifacts distort gene expression distributions and compromise downstream analyses. Numerous imputation methods have been proposed to recover latent transcriptional signals. These methods range from traditional statistical models to deep learning (DL)-based methods. However, their comparative performance remains unclear, as existing benchmarks evaluate only a limited subset of methods, datasets, and downstream analyses. Results: We present a comprehensive benchmark of 15 scRNA-seq imputation methods spanning 7 methodological categories, including traditional and DL-based methods. Methods are evaluated across 30 datasets from 10 experimental protocols on 6 downstream analyses. Results show that traditional methods, such as model-based, smoothing-based, and low-rank matrix-based methods, generally outperform DL-based methods, including diffusion-based, GAN-based, GNN-based, and autoencoder-based methods. In addition, strong performance in numerical gene expression recovery does not necessarily translate into improved biological interpretability in downstream analyses, including cell clustering, differential expression analysis, marker gene analysis, trajectory analysis, and cell type annotation. Furthermore, method performance varies substantially across datasets, protocols, and downstream analyses, with no single method consistently outperforming others. Conclusions: Our findings provide practical guidance for selecting imputation methods tailored to specific analytical objectives and underscore the importance of task-specific evaluation when assessing imputation performance in scRNA-seq data analysis.
Time-Varying Network Driver Estimation (TNDE) Quantifies Stage-Specific Regulatory Effects From Single-Cell Snapshots
Identifying key driver genes governing biological processes such as development and disease progression remains a challenge. While existing methods can reconstruct cellular trajectories or infer static gene regulatory networks (GRNs), they often fail to quantify time-resolved regulatory effects within specific temporal windows. Here, we present Time-varying Network Driver Estimation (TNDE), a computational framework quantifying dynamic gene driver effects from single-cell snapshot data under a linear Markov assumption. TNDE leverages a shared graph attention encoder to preserve the local topological structure of the data. Furthermore, by incorporating partial optimal transport, TNDE accounts for unmatched cells arising from proliferation or apoptosis, thereby enabling trajectory alignment in non-equilibrium processes. Benchmarking on simulated datasets demonstrates that TNDE outperforms existing baseline methods across diverse complex regulatory scenarios. Applied to mouse erythropoiesis data, TNDE identifies stage-specific driver genes, the functional relevance of which is corroborated by biological validation. TNDE offers an effective quantitative tool for dissecting dynamic regulatory mechanisms underlying complex biological processes.